## 1 Sources of Performance

### 1.1 Assumptions

### The ideal CPU:

- Data is in one memory
- Operations are read, write, logic and done on Data
- Ordering goes one after another

#### $\rightarrow$ predictable performance

#### **Actual CPU:**

- $\bullet$  different levels of cache  $\rightarrow$ timing for instruction unclear
- $\bullet$  Operations done in parallel  $\to\! \text{Ordering damaged}$

# 2 Optimization

#### Intrinsics

Inline code, using low level instructions, which is Hardware specific.

High level code can do stuff like loop-unrolling (writing multiple statements in loop because if every processor gets for instance 4 statements, it is more efficient).